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Abstract: For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes 
one of the cornerstone tools. However, the computation of its quantiles under the null or the 
alternative requires complex analytic approximations and more importantly, these distribu- 
tional approximations arc feasible only for moderate dimension of the dependent variable, 
sa y P < 20. On the other hand, assuming that the data dimension p as well as the num- 
ber q of regression variables are fixed while the sample size n grows, several asymptotic 
approximations are proposed in the literature for Wilk's A including the widely used chi- 
square approximation. In this paper, we consider necessary modifications to Wilk's test in 
a high-dimensional context, specifically assuming a high data dimension p and a large sam- 
ple size n. Based on recent random matrix theory, the correction we propose to Wilk's test 
is asymptotically Gaussian under the null and simulations demonstrate that the corrected 
LRT has very satisfactory size and power, surely in the large p and large n context, but 

also for moderately large data dimensions like p = 30 or p = 50. As a byproduct, we give 
a reason explaining why the standard chi-square approximation fails for high-dimensional 

data. We also introduce a new procedure for the classical multiple sample significance test 

in MANOVA which is valid for high-dimensional data. 
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1. Introduction 

In more and more burgeoning science and technology fields and with the help of rapid develop- 
ment in information technology, a huge amount of data is collected where the number of variables 
is usually large. However, most of traditional statistical tools deeply depend on the assumption 
of a large sample size n compared to the number of variables p (data dimension). For high- 
dimensional data analysis, inevitably, these classical tools become inefficient, or even worse, incon- 
sistent. For decades, statisticians devoted special efforts to seek for better approaches in such high- 
dimensional data case. For the two sample significance test problem in high dimensions, as early 
as in 1958, Dempster (1958) proposed a so-called non-exact test (NET) as a remedy to the failure 
of Hotelling's T 2 -test. A rigorous analysis of this NET arises much later in Bai and Saranadasa 
(1996) using modern random matrix theory (RMT). These authors have found necessary correction 
for the T 2 -test to cope with high dimensional effects. 

Recent work in high dimensional statistics include Ledoit and Wolf (2002), Srivastava (2005) 
and Schott (2007). These authors propose several procedures in the high-dimensional setting 
for testing that i) a covariance matrix is an identity matrix, proportional to an identity matrix 
(spherecity) and is a diagonal matrix or ii) several covariance matrices are equal. These procedures 
have the following common feature: their construction involves some well-chosen distance function 
between the null and the alternative hypotheses and rely on the first two spectral moments, namely 
the statistics trS^ and trS" 2 from sample covariance matrices Sk - In a recent work Bai et al. (2009), 
we have considered likelihood based tests about such high dimensional covariance matrices where 
the failure of the classical likelihood ratio test is explained using RMT. Necessary corrections to 
these LRT's are then introduced to achieve consistency. 

This paper pursue the investigation of similar questions but for a multivariate regression model 
with high dimensional data, i.e. the dimensions of the dependent variable as well as the number of 
the regression variables are large compared to the sample size. More precisely, let ap-th dimensional 
regression model 

Xj=Bzj+ej, i = l,...,n (1.1) 

where (£j) is a sequence of i.i.d. zero-mean Gaussian noise N p (0, X) with covariance matrices S, 
B a p x q matrix of regression coefficients, and (z^) a sequence of known regression variables of 
dimension q. To simplify the presentation, we always assume that n > p + q and that the rank of 
Z = (zi, ■ • • , z„) equals q. 
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Let us define a block decomposition B = (Bi,B2) with q\ and q 2 columns, respectively (q = 
Qi +92)- A general linear hypothesis is defined as 

H Q : B 1 = Bl , (1.2) 

where BJ is a given matrix. A well-studied example is the special case B* = yielding a significance 
test for the first q\ regression variables. 

In the general case and under the alternative, the maximum likelihood estimators of (B, X) are 

8 =(f>zA (f> z il , (1.3) 



K i=l / \i=l 



and 



~ 1 " 

E = -5^(x i -Bz i )(x i -Bz i )'. (1.4) 

The corresponding likelihood maximum equals 

Jgf x = (27r)-^ n |Sr5" e -5P™. 

On the other hand, under the null hypothesis, by using a partition i! i = (z£ l5 z^ 2 ) on gi and g 2 
variables repectively, the maximum likelihood estimators of (B2, S) are 

B 20 = (X><<*1 (X>«Xa) , (1-5) 

and 

S « = - I > >< - B 2 oz l , 2 )(y ? : - B 20 z., 2 )' I , (1.6) 



] ° = \ ( E( y * ~~ B2oz 4 ,2)(y !; - B 2 oZi, 2 )' 



where y i = x, — B*Zj,i. The associated likelihood maximum equals 

^0 = (2ir)-i pn \%\-i n e-i pn . (1.7) 
It follows that the likelihood ratio statistic for the test (1.2) equals 

J^o/^i = (A„)"/ 2 , A„ = W\ , (1.8) 

where A„ is the celebrated Wilk's A (Wilks (1932, 1934) and Bartlett (1934)). 
Let us define a similar block decomposition for the sum 



1=1 

and the matrix 



An A12 

A21 A22 , 



Au:2 = An - Ai 2 A 22 1 A 2 i. 
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After some algebraic manipulations, we get (see Anderson (2003), page 302) 

-l 



^F 



n — q 



(1.9) 



where 

F = H^l(n%yi(Bi - Bt)A U:2 (Bi - BJ)', (1.10) 
Qi 

and Bi is a p x q\ matrix made of the first q\ columns of B. 

It is known that nS ~ W p (S,n — q), a Wishart distribution. Moreover, under Hq, 

(Bx - BDA 11:2 (B! - B*J ~ Wp(S, 9l ), 

and this statistic is independent of S. Therefore, Ho will be rejected if A n < Ao for some critical 
value Ao, or cquivalently, when the matrix F has some large enough eigenvalues. 

Under the Gaussian assumptions made here, the exact distribution of A„ is known under the null 
hypothesis. However in practice, it is usually a difficult task to compute the critical value Ao even 
for moderately large p and q. For example, Mathai (1971) used complex analytical approximations 
and established tables for critical values with p and q smaller than 12. 

On the other hand, in a large n asymptotic scheme, one assumes p and q are fixed and then the 
null distribution of — nlogA n is approximated by a Xpq x - Note that for this chi-squared approxi- 
mation, one generally uses a rescaled LRT statistic 

U n = -fclogA„, k = n - q - -{jp - q 1 + 1) . (1.11) 

This correction is known as Bartlctt-Box correction (hereafter BBC) due to Box (1949) and it is 

much less biased than the classical LRT — nlogA„, see Section 3.3 for a detailed comparison. 

However for high dimensional data where the dimensions p and qi are large compared to the 

sample size n, unfortunately the above Xp qi approximation becomes useless. As an example, even 

for moderate p, q and n with y n = p/(n — q) close to 1, the celebrated Marcenko-Pastur theorem 

tell us that the eigenvalues of S tend to fill the whole interval [(1 — ^/yn~) 2 , (1 + y/lM) 2 }- Hence, a 

non-negligible proportion of these eigenvalues are close to zero. Consequently, any statistic based 
— l 

on the inverse S like A n becomes unstable and non robust. 

In §3, by using modern RMT, we introduce a correction to Wilk's A to cope with the mentioned 
high-dimensional effects. The corrected LRT is asymptotically Gaussian and we will see that it 
has very satisfactory size and power, surely for the large p, q and n context, but also for moderate 
data dimensions like p = 30 or p = 50. 

Moreover, to assess the power of the corrected LRT, we examine two additional tests based 
on statistics of least-squares type as suggested in Bai and Saranadasa (1996). A quite intensive 
simulation experiment is then conducted to compare these different procedures for testing (1.2). 
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Next in §4, we consider the classical multiple sample significance test problem but with high- 
dimensional data. As it is well-known, this problem can be embedded into a special instance of 
the general linear hypothesis (1.2). Therefore, by an application of general results of §3, we obtain 
a valid LRT after necessary corrections. 

All the proofs and technical derivations are postponed to §5. 

2. A CLT for linear statistics of random Fisher matrices 

We first recall a fundamental result from RMT for linear statistics of so-called random Fisher 
matrices which will be used below. For any p x p square matrix M with real eigenvalues (A^ ) , 
F M denotes the empirical spectral distribution (ESD) of M, that is, 



We will consider random matrices (M n ) whose ESD F M ™ converges, in a sense to be precise and 
when n — > oo, to a limiting spectral distribution (LSD) F. Assume we have to estimate some 
parameter of F, say 9 = J f{x)dF(x) for some function /, it is natural to use the empirical 
estimator 



which is a so-called linear spectral statistic (LSS) of the random matrices M n . 

Let {£ki £ C,i,k = 1,2, ■■•} and {rjkj £ C,j,k = 1,2,- •■} be two independent double ar- 
rays of i.i.d. complex variables with mean and variance 1. Write = (^u,^2h ' ' ' i^pi) T an d 
rj.j = (rjij, r)2j, ■ ■ ■ ,Vpj) T - Also, for any positive integers n\,n2, the vectors ,£-ni) and 

(77.1, • • ■ , rj. n2 ) can be thought as independent samples of size n\ and n^, respectively, from some 
p-dimcnsional distributions. Let Si and S2 be the associated sample covariance matrices, i.e. 



Then, the following so-called F-matrix generalizes the classical Fisher-statistic to the present p- 
dimcnsional case, 



x£R. 





and 





(2.1) 



where we assume that 712 > p. Here we use the notation n = (m, 712). 



Let us also assume that 




(2.2) 
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Under suitable moment conditions, the ESD F Vn of V n has a LSD F Vlt y 2 with the following density 
function, see p. 72 of Bai and Silverstein (2006), 



£(x) 



(1 -y 2 )\/(b-x)(x-a) 
2irx(yi + y 2 x) 

0, otherwise, 



a < x < b, 



(2.3) 



where 



1 - h 
1-2/2 



b = 



1 



h = s/yi + 2/2 -2/12/2- 



, 1 - 2/2 „ 

Let IA be an open subset of the complex plane which contains the interval [a, b] and A be the set 
of analytic functions / : U i-> C. Define the empirical process G„ := {G n (f)} indexed by A 



G n (f)=p- 



+ 00 



f(x)[F v --F yni , yn2 ](dx), feA. 



(2.4) 



Here F, 



is the distribution in (2.3) with indexes y nk (instead of y^), k=l,2. 



Recently, Zheng (2008) establishes a general CLT for LSS of large-dimensional F matrix. The 
following theorem is a simplified one quoted from it. Throughout the paper, § denotes a contour 
integral along a given contour. 

Theorem 2.1. Let /i, • • • , fk S A, and assume: 

For each p, (£ij t ) and (j)ij 2 ) variables are i.i.d., 1 < i < p, 1 < j\ < m, 1 < J2 < "-2- ^£11 = 
= 0, £|£ n | 4 = £hn| 4 < oo, 2/ fll = £ -> 2/1 G (0, 1), j/ n2 = £ -> y 2 £ (0, 1). 



(%) i?eaZ Case. Assume moreover (£ij) and (?7ij) are reaZ, E\£ 



ii 



S|?7ii| 2 = 1, t/ien i/ie 



random vector (G ra (/i),--- , G„(/fc)) weakly converges to a k-dimensional Gaussian vector 
with the mean vector 



m(fj) = lim (2.5) + (2.6) + (2.7) 



1 



j- f fMO) 

, /3-2/i(l-2/ 2 ) 5 



f 



2tU • h 2 

P ■ 2/2(1 - y 2 ) 
2ni ■ h 



fMO) 



c + i C + f 
1 



dC 



ICI=i 



(C + f 



£2 ,:; 



10=1 I.C + — ) 



i = 1, 



w/iere z(Q = (1 - 2/ 2 )" 2 [l + h 2 + 2h1Z(Q] , h = ^2/1 + 2/2 - 2/i2/2, 
t/ie covariance function 



(2.5) 

(2.6) 

(2.7) 
3, and 



v{f j ,ft) = lim (2.8) + (2.9) 

r— »1 + 



1 

'2^2 



dCidC2, 



rrfCi 



.d|=l^|Ci|=l (Ci-<2) ; 
/?-(2/i+2/2)(l-2/2) 2 /" /j (*(Ci)) 

« 2 4i=i(Ci + f) 2 ^^ 2 |=i(C 2 + f) 2 
./'•' • {I.---./.-}. 



d( 2 



(2.8) 
(2.9) 
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(ii) Complex Case. Assume moreover (£y) and {rjij) are complex, = -E(^ii) = 0, then the 

conclusion of (i) also holds, except the means are (2.6) + (2.7) and the covariance function 
is i lim (2.8) + (2.9) with /3 = £|6i| 4 - 2. 

We should point out that Zheng's CLT for F-matrices covers more general situations the those 
cited in Theorem 2.1. In particular, the fourth moments F|£n| 4 and E^nl 4 can be different. 

The following lemma will be used in §3 for an application of Theorem 2.1 (see (3.5) and (3.6)). 
For a proof, see Bai ct al. (2009). 

Lemma 2.1. For the function f(x) = log(a + bx), x € M, a, b > 0, let (c,d) be the unique 
solution to the equations 

cd=rr^, (2.10) 
< d < c. 

Analogously, let 7, r\ be the constants similar to (c, d) but for the function g(x) = \og(a+f3x), a > 

0, P > 0. Then, the mean and covariance functions in (2.5) and (2.8) equal to 

1 (c 2 -rf 2 )fe 2 
m(/) - 2^(ch-y 2 dr 

v(f,g) - 21og^— . 

07 — dn 



3. Testing a general linear hypothesis in high-dimensional regressions 
3.1. A corrected LR test 

The construction of a correct scaling for the LRT statistic A„ of the test (1.2) will rely on the 
CLT 2.1. Recall that 

-1 

, F = ^^(nS)- 1 (B 1 -Bl)A 11:2 (B 1 -BD'. 

Under Hq, we have 

nS ~ VK P (£, n - q), (Bi - B*)An :2 (Bi - Bj)' ~ W p (£, qi ), 

and they are independent. Consequently, F is exactly distributed as the -F-matrix V n defined in 
(2.1), where in addition all the variables are Gaussian. 

Our correction to the LRT statistic A„ is given in the following theorem. 

Theorem 3.1. For the general linear hypothesis (1.2) in the regression model (1.1), let A„ be 
Wilk's LRT statistic given in (1.9). Define also the function 

fix) = io g (i + y -^x) , 



qi 



n — q 
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and assume that 

V P 
p^oo, gi^oo, n-q^oo, y ni = > y 1 £ (0,1), y„ 2 = > y 2 £ (0,1). (3.1) 

qi n - q 

Then, under the null, 

T n = v(f)-i [-logA n -p-F Vniiy ^(f)-m(f)]^X(0,l), (3.2) 

where m(f),v(f) and F y y (f) are defined in (3.5)(3.6)and (3.8), respectively. 

Before giving a proof, it is worth mentioning that at a first look, the asymptotic framework 
depicted in (3.1) seems complicated. Indeed, this is a common set-up in RMT and simply requires 
that the degrees of freedom of the underlying Wishart matrices grow to infinity in a proportional 
way with the sample size. 

Proof. Since F can be represented by a Gaussian V n , we have 

-logA„ = io g |j+-5-vg 

n — q 

= Eiog(i+-^) 

L — ' n — q 

i—i 

= p- /log(l + x)dF Vn (x). 
J n-q 

Define f(x) = log(l + -^^x), by y ni = pjq\,y n , 2 = pj (n — q) , also it can be written as 



/(x)=log(l + ^). (3.3) 



From 



logA„ = p ■ j f(x)dF v "(x) 

= p ■ [ f(x)d (F v - (x) - F Vni , y „ 2 (*)) + p ■ F Vni , yn2 (/) 



where F yni>yn (/) = J f(x)dF VnitVn2 (x) and F Vni>yn (x) is the limiting distribution which has a 
density in (2.3) but with y nk instead of yk, k = 1, 2. Then we get 

G n (f) = -logA„ -p ■ F yni , yn2 (f). (3.4) 

By Theorem 2.1, G n (f) weakly converges to a Gaussian vector with mean 

1 (c 2 -d 2 )h 2 

"*»-Ww4& (3 ' 5) 



and variance 



c 2 



w (/) = 21 °g(^r^) ( 3 - 6 ) 
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for the real case, where 



h = Vyl +V2- 2/12/2 
(1 T ^) 2 



a ,b = 
c,d = 



(1-2/2) 2 
1 



1 H 6 ±, H »o 

2/i V 2/1 



, c > d. 



This is calculated in §5 using Lemma 2.1. For the complex case, the mean m(f) is zero and the 
variance is half of v(f). In other words, 



logA n -p-F yni , yn2 (f) => N(m(f),v(f)). 



(3.7) 



Here 

Fy ni ,yn 2 (/) = 
where 



Vn 2 



■ logc„ + — - log(c„ - d„ h n ) 



, 2/"i Vn-2 j^^, J c n h n d n y n2 | ,^ 



2/ri! 



2/ni2/iJ2 



ni 1 2/«2 VnxVn 

(itM 2 

(l-2/n 2 ) 2 

'l + ^fe„± A /l + 

2/m V Vni 



7 Cn !> dn 



is derived in §5 using the density function of Fy nilVn2 ■ Then we get letting qi A (n — qi) — > oo, 
r„ = «(/)-* [- log A„ - V ■ F Vni , yn2 (/) - m(/)] ^ TV (0, 1) . 

□ 

We call Corrected likelihood ratio test (CLRT) for testing (1.2) the test based on the statistic T n 
and its asymptotic distribution derived in the theorem above. Moreover, it is worth noticing that 
in the above proof, we used the Gaussian assumption for entry variables to fit F to a Gaussian F- 
matrix. However, Theorem 2.1 does not need this Gaussian assumption. Therefore, we can expect 
(or conjecture) that the asymptotic distribution for T n in Theorem 3.1, hence the CLRT, could 
be valid more generally However, the kurtosis parameter f3 appeared in Theorem 2.1 is no more 
null and it will appears in the asymptotic parameters m(f) and v (/) above. 



3.2. Two least-squares based procedures for testing (1.2) 

To evaluate the corrected LRT, we consider two additional procedures based on least-squares 
type statistics as suggested in Bai and Saranadasa (1996). We first need to find the asymptotic 
distributions of these statistics. 
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By (1.3) and the partition of B, we obtain 

n n 

B,=J2 *i<i A^ 2 - J2 x i z b A 22 1 A 2 iA n ^ 2 . (3.9) 

i=l i=l 

Let 



M„,x = tr^Bi-BDCBx-B;)'], (3.10) 
M n , 2 = trfCB: -BI)A 11:2 (B 1 -BI)') . (3.11) 



Because B is a unbiased estimator of B, then ETSi — BJ under the null hypothesis. Thus 

EM nA = tr(E)tr(A^ a ), (3.12) 

EM n>2 = gitr(E), (3.13) 

a 2 nA = Var(M nA ) = 2tr(S 2 )tr(A^ 2 ) + (3.14) 

a 2 2 = Vor(M n , 2 ) = 2g 1 tr(S 2 )+/3 :c /3 z2 , (3.15) 



where 



& = J B(e' 1 e 1 ) 2 -(tr(S)) 2 -2tr(S 2 ), 

n 

A* = H [Ki _ z b A 22 lA 2i)A n 2 2 (z 4 ,i - AiaA^Zi^)]' 

?:=i 

n 

Pz2 = J! [Ki - z b A 22 lA 2i)A n ] : 2 (z 4!l - AiaA^Zj.a)]' 



Define 

Zf ) = A n ( :2 " fe)/2 (z M - AuAJaV.a), ^ = 1,2. (3.16) 
Theorem 3.2. Assuming that 

1. min(gi,p, n — q) — > oo; 

2. As p -> oo, tr£? = o{{trH) 2 ); 

3. max^ fc) '^ fc) = (^A u (2 -' £) ]); 

l<i<n 

4- (ci), i = !,■■■ ,n are i.i.d. zero-mean random vectors such that for any rj > 0, there exists 
a K > 0, such that 

E{e' 1 e 2 ) 2 < K(trS 2 ), 

maxi?( £ ' l£2 ) 2 / (V1E2I > V\/ 'trA~^t k) trV 2 /\Z^' = o(n 2 {trT 2 )) , 

E&E! - trX) 2 < A"(ir£ 2 ), 

Eftex - trV) 2 I (V1E1 - trZ\ > n^ zk trY, 2 ^ ] \) = o{n 2 {trH 2 )). 
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Then for k = 1, 2 and under Hq in (1.2), 

r„, fe := ! =>■ N(0, 1). 

Cn,fc 

Consequently, to test (1.2), we can use any of the statistics r n> i and T n> 2- These tests will be 
referred below as ST1 and ST2. 



3.3. A simulation study for comparison of the tests 

We set up a simulation experiment to compare five procedures for testing (1.2): the classical 
LRT with an asymptotic \ 2 approximation, the associated Bartlett-Box correction (BBC) recalled 
in (1.11), our corrected LRT (CLRT) introduced in §3 and the two tests ST1 and ST2 based 
on least-squares type statistics of §3.2. Denote the non-center parameter as ifj = c^ipQ, where 
■00 = tr ((Bi — B^)'S _1 (Bi — B*)), and Co is a varying constant. Then we consider the model 
(1.1) as the form x, = Co(Bi — BJ)zj + £j, i = 1, . . . , n. Assume that the elements of (Bi — B*) 
follow the distribution N(l, 1). All the i.i.d. elements of in the model are sampled from N(l, 0.5). 
The errors £, in (1.1) have a multivariate normal distribution N p (0, C) with 

( \ p p 2 ■■■ pP- 1 ^ 
c= P 1 P ■■■ P 1 " 2 

v P v ~ l P p ~ 2 ■■■ P 1 ) 

Therefore, p measures the degree of correlations between the p coordinates of the noise vectors. To 
understand the effect of these correlations on the test procedures, we consider two cases: p = 0.9 
and p = 0. 

For different values of (p, n, q, qi), we compute the realized sizes (Type-I errors) of the five 
tests based on 1,000 independent replications. All the tests are defined with an nominal (and 
asymptotic) level a = 0.05. The powers of the tests are evaluated under alternative hypotheses 
obtained by varying the parameter cq. 

Table 1 gives the sizes (line cq = 0, in bold) and the powers (co 7^ 0) for the case p = and 
various choices of the dimensions (p, n, q, qi). Table 2 displays analogous results for the case p = 0.9 
where the coordinates of the noise sequence are highly correlated. The important conclusions from 
these tables are as follows. 

Test size: • The LRT and BBC correction are highly inconsistent: in all considered cases, the 
LRT and its BBC correction have a much higher size than the nominal value 5%. In 
particular, the LRT systematically rejects the null hypothesis, even for data dimension 
as small as p = 10, while the BBC correction is just less biased as expected. 
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• In the case where the coordinates of the noise are uncorrelated (Table 1), the three 
tests CLRT, ST1 and ST2 which are based on the RMT, achieve a correct level close 
to 5%. 

In contrary, when these correlations are high (Table 2), as the least-squares type tests 
ST1 and ST2 heavily depend on an assumed non correlation between these coordinates, 
these two tests become inconsistent. 
The power function: In the case where the coordinates of the noise are uncorrelated (Table 1), 
while being all consistent, CLRT and ST2 outperform the test ST1. 

When these coordinates are highly correlated (Table 2) and despite their inconsistency, the 
tests ST1 and ST2 are outperformed by the CLRT. For example, in the case p = 0.9, n = 
200, p = 30, the highest power of ST1 and ST2 are only 0.283 and 0.115, respectively. 

To summarize, among the five tests considered here, only the CLRT displays an overall con- 
sistency and a generally satisfactory power. In particular, this test is robust with regard to the 
correlations between the coordinates of the noise process. 

Lastly, Figures 1 and 2 give a dynamic view of these comparisons by varying the non-central 
parameter Co for the cases p = and p = 0.9, respectively. Note that the left-first point of all lines 
represent the realized sizes (Type I errors) of the tests, and others are the powers. 

4. A high dimensional multiple sample significance test 

In this section we consider the following multiple sample significance test problem in a MANOVA 
with high-dimensional data. For the two sample case, this problem has been considered by Dempster 
(1958) and Bai and Saranadasa (1996). Here we treat the general multiple sample case. Consider 
q Gaussian populations 3Nf(/x^',S) of dimension p, 1 < i < q, and for each population, assume 
that we have a sample of size nc {x£ , 1 < k < n»}. We wish to test the hypothesis 

H : = • • • = . (4.1) 

High dimensional here means that both the number q of the populations and the dimension p of 
the observation vectors are large with respect to the sample sizes (rii)'s. 
Clearly, the observations can be put in the form 

= fi« + e { *\ 1 < i < q, 1 < k < m, (4.2) 

where {ejj? } is an array of i.i.d. random vectors distributed as N p (0, S). We are going to embed the 
test (4.1) into a special instance of the regression test (1.2). To this end, let {ei} be the canonical 
base of R p and we define the following regression vectors 

z fc } = [ e * + e <?] !{;«?} + e q l{ l=q }, l<i<q, l<k<Ui. 
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Define moreover the p x q matrix B = (Bi, B2) with 

Br = (mW-^),,.,^- 1 )-^)), (4.3) 
B 2 = (4.4) 

Note that the dimension q is split to ((71, 92) = (g — 1, 1) in the above decomposition. 
Therefore, the observations follow a linear model 

x£ } = Bzj^ + 4 4) , 1 < i < Q, 1 < k < n,. (4.5) 

The multiple sample test (4.1) is equivalent to the following regression test 

H : B x = . (4.6) 

In order to apply Theorem 3.1, we now identify the likelihood ratio statistic A„ defined in (1.8). 
Here denote n = YH=i n i- Under the null hypothesis, the likelihood estimates of (B2, S) are (see 
Anderson (2003) for details of computation) 



B 



£ o = ^(^-^(x^-x)'. (4.8) 

i,k 

On the other hand, under the alternative hypothesis, the likelihood estimates of (/x*-^, S) are 

2 W = x«:=-Vxi. i} , l<i<q, (4.9) 

* = ^B^-^x^-^y- (4.10) 

The likelihood ratio statistic A„ = |E|/|Eo| readily follows. 
By application of Theorem 3.1, we have the following 

Proposition 4.1. For the multiple sample significance test (4.1) ; assume that q — > 00, n, — > 00, 
1 < i < 9, P 00 w smc/i a manner that 

Vm ■= ~^~r -> 2/1 e (0, 1), 2M 2 = -> y 2 e (0, l). (4.11) 
g — 1 n — g 

Then, for the same function f defined in (3.3), we have 

T* = v(f)~i [- log A„ - p ■ F Vni , yn2 (/) - m(/)] X (0, 1) . 

where v(f),m(f) and F y Vn (/) are defined in (3.5), (3.6) and (3.8) respectively, with the values 
of y ni ,yn 2 ,yi,y2 defined in (4-11). 



Z. Bai, D. Jiang, J. Yao and S. Zheng/Testing in high- dimensional regressions 14 

It is worth noticing here that the classical likelihood ratio test (LRT) for testing (4.1) will rely 
on the following weak convergence theorem: under Ho and assuming fixed p and q while letting 

Jli — > oo, 

-nlogA„=^Xp( 9 _i) • (4-12) 

Inevitably, in high dimensional case, U n will drifts to infinity by Proposition 4.1. Consequently, 
this classical ^-approximation will leads to a test size much higher than a given nominal test 
level, exactly as for the general linear hypothesis considered in §3. 



5. Proofs 

Proof of (3.5) and (3.6): 

Because Xj are Gaussian variables, for real case, (3 = -E|£| 4 — 3 = 0, then (2.6), (2.7) and (2.9) 
are all 0. Consider (2.5) and (2.8), as y nk — > y^, k = 1,2, and during the process of Lemma 2.1 
calculation, we will see that the constant and items approaching to zero do not effect on the the 
circle integration results, and in practice y nk = j/fe, k = 1, 2. So we use 

f(x) = log(l + V -^x) 

yi 

instead of f(x) = log(l + y^x). Make substitute x = (1 — ?/2)~ 2 (l + h 2 — 2hcos9), where z(£) = 
(1 - yiY 2 [1 + h 2 + 2hK(0] , h = Vyi +2/2 -2/12/2- Because 

log(l + ^z(£))=log(| C + ^| 2 
2/i v 

where 



c, d 
a , b 



2/i 



1)2 
( 

2/1 



-aa 



(1 + hf 



,c>d, 



(1-2/2) 2 ' 

is the solution of the equation (2.10) with a, a = 1, 6, /3 = Then use Lemma 2.1, we have 



m(f) 



1 (c 2 - d 2 )h 2 

2 ° S (eft - y 2 d) 2 ' 



v(f) = 2 log 



( / 2 



for the real case. 



Proof of F yni , yn2 (f), Eg. (3.8): 

For this computation we drop the indexes n\ and ni in the parameters y nj and compute the 
integral F VltV2 (f). Following a device designed in Zheng (2008) (Lemma A. 2), let m{z) be the 
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Stieltjes transform of the distribution function F_ := (1 — 2/i)-f(o,oo) + ViFyuv^- F° r r > 1 but very 
close to 1 and |£| = 1, we use a change of variable z = </>(£) which is implicitly defined by the 
formula mo(z) = —(1 + hr£)/(l — 2/2) and we have the following relations 

(1 - 2/2) (moO) + 



_ moCzKmo^ + l-yQ ^ 



(771 (Z) + T^) (1 - 2/2) 



Or equivalently, 



1 + ft, 2 + hr- 1 ^ + hr£ . . , 
z = — — and m(z ) 



m (z)(m (z) + 1) 



-(i-y 2 ) 2 e 



This shows that when £ anticlockwise runs along the unit circle, z anticlockwise runs a contour 
which closely encloses the interval [a, b] when r is close to 1 where a = r^Ewp an d & — (i+k) 
So we obtain 



(1-2/2) 2 



where 



/(8) a z gjjgEgp 5 <fa = yr i / 



27rcc(yi + y 2 a;) 

a 

- — <j> f(z)m(z)dz (Any contour C enclosing the interval [a, b}) 



2*^ 1 ( 1 ^ + « + 1 ^ + *-'»«« + i)({+t ) 
(making — ^ £ in the second integral ) 



1 



log(c + d£) 



2/2 - 1 , , . «i - 1 , s , 2/1 + 2/2 , f ch-dy 2 
■ log(c) H log(c - d/i) H log 



2/2 



2/1 



2/12/2 



l+» 6±4 /l+» a 



2/1 



2/1 



c > d. 
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p = 




(jp,n,q,q\) 


= (10, 100,50,30) 






(p,n,q, qi ) = (20, 100,60,50) 




Parameter cq 


LRT 


CLRT 


BBC 


ST1 


ST2 


LRT 


CLRT 


BBC 


ST1 


ST2 





1 


0.056 


0.101 


0.070 


0.086 


1 


0.047 


0.672 


0.042 


D. 072 


0.01 


1 


0.064 


0.113 


0.071 


0.096 


1 


0.084 


0.741 


0.044 


0.129 


0.02 


1 


0.083 


0.150 


0.080 


0.136 


1 


0.203 


0.879 


0.050 


0.395 


0.03 


1 


0.150 


0.224 


0.098 


0.222 


1 


0.381 


0.963 


0.063 


0.851 


0.04 


1 


0.247 


0.342 


0.125 


0.387 


1 


0.583 


0.992 


0.091 


0.998 


0.05 


1 


0.382 


0.500 


0.156 


0.588 


1 


0.784 


0.999 


0.127 


1 


0.06 


1 


0.574 


0.676 


0.200 


0.792 


1 


0.914 


1 


0.173 


1 


0.07 


1 


0.747 


0.829 


0.279 


0.932 


1 


ft Q7Q 


1 


0.257 


1 


U.Uo 


1 


U.ooO 


0.925 


0.375 


U-yoo 


1 


ft QQfi 


1 


0.374 


1 

1 


o no 


1 
1 


u.yoo 


0.980 


0.496 


u.yy / 


1 
1 


ft QQO 

u.yyy 


1 


0.526 


1 

1 


n in 
U.1U 


1 

X 


U.yoD 


0.990 


0.624 


1 

1 


1 




1 


0.681 


1 

1 


p — 




(p, n, q, gi) 


= (30, 200, 80, 60) 








<?i) = (50,200,80,70) 




Parameter cq 


LRT 


CLRT 


BBC 


ST1 


ST2 


LRT 


CLRT 


BBC 


ST1 


ST2 





1 


0.060 


0.178 


0.054 


0.062 


1 


0.056 


0.495 


0.036 


0. 048 


0.003 


1 


0.062 


0.190 


0.055 


0.065 


1 


0.063 


0.551 


0.040 


0.065 


0.006 


1 


0.078 


0.221 


0.060 


0.083 


1 


0.099 


0.668 


0.042 


0.135 


0.009 


1 


0.106 


0.276 


0.068 


0.123 


1 


0.210 


0.797 


0.048 


0.372 


0.012 


1 


0.164 


0.357 


0.071 


0.229 


1 


0.363 


0.908 


0.060 


0.734 


0.015 


1 


0.232 


0.462 


0.082 


0.352 


1 


0.560 


0.972 


0.073 


0.974 


0.018 


1 


0.348 


0.584 


0.097 


0.501 


1 


0.742 


0.991 


0.103 


0.999 


0.021 


1 


0.483 


0.725 


0.131 


0.715 


1 


0.871 


0.998 


0.152 


1 


0.024 


1 


0.616 


0.831 


0.182 


0.874 


1 


0.939 


0.999 


0.207 


1 


0.027 


1 


0.771 


0.911 


0.241 


0.970 


1 


0.984 


1 


0.304 


1 


0.03 


1 


0.872 


0.954 


0.325 


0.993 


1 


0.995 


1 


0.414 


1 



Table 1 

Sizes (cq = 0) and powers (co ^ 0) of the four methods, based on 1,000 independent applications with real 
Gaussian variables. The parameter p in the covariance matrix of errors equals to 0. 
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p = 0.9 




(p, n, q, qi) 


= (10, 100,50,30) 






(p,n, <?,<?!) = (20, 100,60,50) 






LRT 


CLRT 


BBC 


ST1 


ST2 


LRT 


CLRT 


BBC 


ST1 


ST2 





1 


0.056 


0.089 


0.105 


0.119 


1 


0.055 


0.681 


0.087 


0.155 


0.005 


1 


0.063 


0.099 


0.106 


0.121 


1 


0.063 


0.696 


0.088 


0.164 


0.010 


1 


0.078 


0.123 


0.107 


0.124 


1 


0.089 


0.762 


0.089 


0.187 


0.015 


1 


0.110 


0.162 


0.109 


0.134 


1 


0.165 


0.849 


0.091 


0.220 


0.020 


1 


0.164 


0.234 


0.111 


0.143 


1 


0.261 


0.923 


0.093 


0.261 


0.025 


1 


0.253 


0.355 


0.116 


0.161 


1 


0.458 


0.974 


0.095 


0.323 


0.030 


1 


0.388 


0.491 


0.118 


0.182 


1 


0.690 


0.999 


0.099 


0.408 


0.035 


1 


0.562 


0.652 


0.123 


0.215 


1 


0.878 


1 


0.101 


0.503 


0.040 


1 


0.724 


0.811 


0.130 


0.250 


1 


0.963 


1 


0.105 


0.610 


0.045 


1 


0.873 


0.926 


0.136 


0.284 


1 


0.998 


1 


0.110 


0.704 


0.050 


1 


0.951 


0.979 


0.111 


0.343 


1 


1 


1 


0.115 


0.801 


p = 0.9 




(P,n,q,qi) 


= (30, 200, 80, 60) 






(p,n,q, qi ) = (50,200,80,70) 




Parameter cq 


LRT 


CLRT 


BBC 


ST1 


ST2 


LRT 


CLRT 


BBC 


ST1 


ST2 





1 


0.054 


0.181 


0.089 


0.105 


1 


0.059 


0.520 


0.098 


0.100 


0.002 


1 


0.059 


0.197 


0.090 


0.106 


1 


0.060 


0.536 


0.099 


0.107 


0.004 


1 


0.074 


0.223 


0.090 


0.109 


1 


0.079 


0.604 


0.100 


0.116 


0.006 


1 


0.113 


0.288 


0.091 


0.115 


1 


0.140 


0.697 


0.101 


0.136 


0.008 


1 


0.178 


0.400 


0.091 


0.126 


1 


0.233 


0.811 


0.102 


0.175 


0.010 


1 


0.287 


0.530 


0.092 


0.140 


1 


0.409 


0.913 


0.104 


0.230 


0.012 


1 


0.445 


0.691 


0.093 


0.161 


1 


0.633 


0.979 


0.107 


0.300 


0.014 


1 


0.643 


0.840 


0.097 


0.180 


1 


0.826 


0.993 


0.114 


0.379 


0.016 


1 


0.821 


0.939 


0.101 


0.202 


1 


0.953 


1 


0.118 


0.481 


0.018 


1 


0.937 


0.986 


0.107 


0.238 


1 


0.992 


1 


0.125 


0.597 


0.020 


1 


0.987 


0.996 


0.115 


0.283 


1 


1 


1 


0.131 


0.694 



Table 2 

Sizes (cq = 0) and powers (cq ^ 0) of the four methods, based on 1,000 independent applications with real 
Gaussian variables. The parameter p in the covariance matrix of errors equals to 0.9. 
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Figure 1. Sizes (cq = 0) and Powers (cq ^ 0) of the four methods, which are the corrected LRT (CLRT), Bartlett- 
Box correction (BBC) and two least-squares type tests ( ST1 and ST2 ), based on 1,000 independent replications 
using Gaussian error variables from Top row: (p,n,q,qi) = (10,100,50,30) and (20,100,60,50). 

Bottom row: {p,n,q,q-i) = (30,200,80,60) and (50,200,80,70). 
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Figure 2. Sizes (co = 0) and Powers (co ^ 0) of the four methods, which are the corrected LRT (CLRT), Bartlett- 
Box correction (BBC) and two least-squares type tests ( ST1 and ST2 ), based on 1,000 independent replications 
using Gaussian error variables from N(0, C) with the parameter p = 0.9. Top row: (p, n, q, <ji) = (10, 100, 50, 30) 
and (20, 100, 60,50). Bottom row: (p, n, q, gi) = (30, 200, 80, 60) and (50, 200, 80, 70). 



